摘要 :
Alternative splicing of pre-mRNAs is a major mechanism of generating protein diversity in higher eukaryotes. It is well known that one of the major forms of alternative splicing is exon skipping, which alternatively skips specific...
展开
Alternative splicing of pre-mRNAs is a major mechanism of generating protein diversity in higher eukaryotes. It is well known that one of the major forms of alternative splicing is exon skipping, which alternatively skips specific exon during splicing. Previous research, by Miriami and his colleagues, was conducted by using a statistical method to identify two motifs both in upstream and downstream introns that were associated with exon skipping events. In this study, we employed pattern branching motif finding algorithm and approximate mining of consensus sequences algorithm. The data mining approach developed in the study showed its strength of being able to completely discover all possible motifs and made itself a more ideal approach compared to Miriamis.. statistical one. Furthermore, because of the similarities in genes between human and mouse, we compared the two species motifs in our study. We also discovered the patterns are very likely associated with exon skipping event between human and mouse.
收起
摘要 :
In this article we show how to extend object constraint languages by reflection. We choose OCL (Object Constraint Language) and extend it by operators for reification and reflection. We show how to give precise semantics to the ex...
展开
In this article we show how to extend object constraint languages by reflection. We choose OCL (Object Constraint Language) and extend it by operators for reification and reflection. We show how to give precise semantics to the extended language OCL_R by elaborating the necessary type derivation rules and value specifications. A driving force for the introduction of reflection capabilities into a constraint language is the investigation of semantics and pragmatics of modeling constructs. We exploit the resulting reflective constraint language in modeling domains including sets of sets of domain objects. We give precise semantics to UML power types. We carve out the notion of sustainable constraint writing which is about making models robust against unwanted updates. Reflective constraints are an enabler for sustainable constraint writing. We discuss the potential of sustainable constraint writing for emerging tools and technologies. For this purpose, we need to introduce a symbolic viewpoint of information system modeling.
收起
摘要 :
The data warehousing approach intends to exploit a very large volume of data to make relevant decisions. In this paper, we deal with object-oriented data warehouse design. More precisely, we present an object-oriented data warehou...
展开
The data warehousing approach intends to exploit a very large volume of data to make relevant decisions. In this paper, we deal with object-oriented data warehouse design. More precisely, we present an object-oriented data warehouse model, integrating temporal and archive data. We provide functions allowing the administrator to specify a data warehouse from a global source schema.
收起
摘要 :
Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; how...
展开
Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; however, federated query engines are still required in order to provide an integrated view of these datasets. Given the large number of Web accessible RDF datasets, SPARQL federated query engines implement query processing techniques to effectively select the relevant datasets that provide the data required to answer a query. Existing federated query engines usually utilize coarse-grained description methods where datasets are characterized based on their vocabularies or schema, and details about data in the dataset are ignored, e.g., classes, properties, or relations. This lack of source description may lead to the erroneous selection of data sources for a query, and unnecessary retrieval of data and source communication, affecting thus the performance of query processing over the federation. We address the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of an abstract description of entities belonging to the same RDF class, dubbed as an RDF molecule template, and utilizes them for source selection, and query decomposition and optimization. We empirically study the performance and continuous efficiency of MULDER on existing benchmarks, and compare with respect to existing federated SPARQL query engines. The experimental results suggest that RDF molecule templates empower MULDER, and allow for selection of RDF data sources that not only reduce execution time, but also increase answer completeness and continuous efficiency of MULDER.
收起
摘要 :
Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; how...
展开
Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; however, federated query engines are still required in order to provide an integrated view of these datasets. Given the large number of Web accessible RDF datasets, SPARQL federated query engines implement query processing techniques to effectively select the relevant datasets that provide the data required to answer a query. Existing federated query engines usually utilize coarse-grained description methods where datasets are characterized based on their vocabularies or schema, and details about data in the dataset are ignored, e.g., classes, properties, or relations. This lack of source description may lead to the erroneous selection of data sources for a query, and unnecessary retrieval of data and source communication, affecting thus the performance of query processing over the federation. We address the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of an abstract description of entities belonging to the same RDF class, dubbed as an RDF molecule template, and utilizes them for source selection, and query decomposition and optimization. We empirically study the performance and continuous efficiency of MULDER on existing benchmarks, and compare with respect to existing federated SPARQL query engines. The experimental results suggest that RDF molecule templates empower MULDER, and allow for selection of RDF data sources that not only reduce execution time, but also increase answer completeness and continuous efficiency of MULDER.
收起
摘要 :
Information and communication technology (ICT) is impacting our daily lives more than ever before. Many existing applications guide users in their daily activities (e.g., navigation through traffic, health monitoring, managing hom...
展开
Information and communication technology (ICT) is impacting our daily lives more than ever before. Many existing applications guide users in their daily activities (e.g., navigation through traffic, health monitoring, managing home comfort, socializing with others). Although these applications are different in terms of purpose and application domain, they all detect events and propose actions and decision making aid to users. However, there is no usage of a common backbone for event detection that can be instantiated, re-used, and reconfigured in different use cases. In this paper, we propose eVM, a generic event Virtual Machine able to detect events in different contexts while allowing domain experts to model and define the targeted events prior to detection. eVM simultaneously considers the various features of the defined events (e.g., temporal, geographical), and uses the latter to detect different featurecentric events (e.g., time-centric, location-centric). eVM is based on different components (an event query language, a query compiler, an event detection core, etc.), but mainly the event detection modules are detailed here. We show that eVM is re-usable in different contexts and that the performance of our prototype is quasi-linear in most cases. Our experimental results showed that the detection accuracy is improved when, besides spatio-temporal information, other features are considered.
收起
摘要 :
Information and communication technology (ICT) is impacting our daily lives more than ever before. Many existing applications guide users in their daily activities (e.g., navigation through traffic, health monitoring, managing hom...
展开
Information and communication technology (ICT) is impacting our daily lives more than ever before. Many existing applications guide users in their daily activities (e.g., navigation through traffic, health monitoring, managing home comfort, socializing with others). Although these applications are different in terms of purpose and application domain, they all detect events and propose actions and decision making aid to users. However, there is no usage of a common backbone for event detection that can be instantiated, re-used, and reconfigured in different use cases. In this paper, we propose eVM, a generic event Virtual Machine able to detect events in different contexts while allowing domain experts to model and define the targeted events prior to detection. eVM simultaneously considers the various features of the defined events (e.g., temporal, geographical), and uses the latter to detect different feature-centric events (e.g., time-centric, location-centric). eVM is based on different components (an event query language, a query compiler, an event detection core, etc.), but mainly the event detection modules are detailed here. We show that eVM is re-usable in different contexts and that the performance of our prototype is quasi-linear in most cases. Our experimental results showed that the detection accuracy is improved when, besides spatio-temporal information, other features are considered. Event detection; Semantic clustering; Formal concept analysis.
收起
摘要 :
In this paper, we propose a learning approach to adaptive performance tuning of database applications. The objective is to validate the opportunity to devise a tuning strategy that does not need prior knowledge of a cost model. In...
展开
In this paper, we propose a learning approach to adaptive performance tuning of database applications. The objective is to validate the opportunity to devise a tuning strategy that does not need prior knowledge of a cost model. Instead, the cost model is learned through reinforcement learning. We instantiate our approach to the use case of index tuning. We model the execution of queries and updates as a Markov decision process whose states are database configurations, actions are configuration changes, and rewards are functions of the cost of configuration change and query and update evaluation. During the reinforcement learning process, we face two important challenges: the unavailability of a cost model and the size of the state space. To address the former, we iteratively learn the cost model, in a principled manner, using regularization to avoid overfitting. To address the latter, we devise strategies to prune the state space, both in the general case and for the use case of index tuning. We empirically and comparatively evaluate our approach on a standard OLTP dataset. We show that our approach is competitive with state-of-the-art adaptive index tuning, which is dependent on a cost model.
收起
摘要 :
Automating the life cycle of data management projects is a challenging issue that has attracted the interest of both academic researchers and industrial companies. Therefore, several commercial and academic tools have been propose...
展开
Automating the life cycle of data management projects is a challenging issue that has attracted the interest of both academic researchers and industrial companies. Therefore, several commercial and academic tools have been proposed to be used in a broad range of contexts. However, when dealing with data generated from connected environments (e.g., smart homes, cities), the data acquisition and management becomes more complex and heavily dependant on the environmental context thus rendering traditional tools less efficient and appropriate. In this respect, we introduce here OpenCEMS, an open platform for data management and analytics that can be used in various application domains and contexts, and more specifically in designing connected environments and analysing their generated/simulated data. Indeed, OpenCEMS provides a wide array of functionalities ranging from data pre-processing to post-processing allowing to represent and manage data from the different components of a connected environment (e.g., hardware, software) and to define the interactions between them. This allows to both simulate data with respect to different parameters as well as to contextualise collected data from the connected devices (i.e., consider environmental/sensing contexts). In this paper, we compare OpenCEMS with existing solutions and show how data is represented and processed.
收起
摘要 :
Several distributed storage solutions that do not rely on a central server have been proposed over the last few years. Most of them are deployed on public networks on the internet. However, these solutions often do not provide a m...
展开
Several distributed storage solutions that do not rely on a central server have been proposed over the last few years. Most of them are deployed on public networks on the internet. However, these solutions often do not provide a mechanism for access rights to enable the users to control who can access a specific file or piece of data. In this article, we propose Mutida (from the Latin word "Aditum" meaning "access"), a protocol that allows the owner of a file to delegate access rights to another user. This access right can then be delegated to a computing node to process the piece of data. The mechanism relies on the encryption of the data, public key/value pair storage to register the access control list and on a function executed locally by the nodes to compute the decryption key. After presenting the mechanism, its advantages and limitations, we show that the proposed mechanism has similar functionalities to Wave, an authorization framework with transitive delegation. However, Wave does not require fully trusted nodes. We implement our approach in a Java software program and evaluate it on the Grid'5000 testbed. We compare our approach to an approach based on a protocol relying on Shamir key reconstruction, which provides similar features.
收起